Computer Vision Project submitted for PGP-AIML Great Learning on 01-May-2022
DOMAIN: Botanical Research
CONTEXT:University X is currently undergoing some research involving understanding the characteristics of plant and plant seedlings at various stages of growth. They already have have invested on curating sample images. They require an automation which can create a classifier capable of determining a plant's species from a photo.
• DATA DESCRIPTION: The dataset comprises of images from 12 plant species.
Source: https://www.kaggle.com/c/plant-seedlings-classification/data
• PROJECT OBJECTIVE: : To create a classifier capable of determining a plant's species from a photo
1.Import and Understand the data [12 Marks]
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import collections
import seaborn as sns
import pandas as pd
from sklearn.metrics import mean_squared_error, confusion_matrix, classification_report, roc_curve, precision_recall_curve, roc_auc_score, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense, Activation, LeakyReLU
from keras import optimizers
# Import label encoder
from sklearn import preprocessing
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout
from keras.layers import Dropout, InputLayer, BatchNormalization
from tensorflow.keras import layers
import cv2
import glob
import os
A. Extract ‘plant-seedlings-classification.zip’ into new folder (unzipped) using python. [2 Marks]
Hint: You can extract it Manually by losing 2 marks.
## Extract the file from zip file
import zipfile
zip_reference = zipfile.ZipFile('plant-seedlings-classification.zip', 'r')
## Extract to new folder called unzipped
zip_reference.extractall('unzipped') # unzip directory
zip_reference.close()
unzipped folder got created in the file structure all the files are extracted there
B.Map the images from train folder with train labels to form a DataFrame. [6 Marks]
main_dir = 'unzipped/plant-seedlings-classification'
train_dir = main_dir +'/train'
classes=os.listdir(train_dir)
classes.remove('.DS_Store')
len(classes)
12
classes
['Black-grass', 'Charlock', 'Cleavers', 'Common Chickweed', 'Common wheat', 'Fat Hen', 'Loose Silky-bent', 'Maize', 'Scentless Mayweed', 'Shepherds Purse', 'Small-flowered Cranesbill', 'Sugar beet']
def createDataFrame(directoryName):
data = pd.DataFrame()
for (root, dirs, files) in os.walk(directoryName+'/'):
for f in files:
if(f=='.DS_Store'):# ignoring DS store file
continue
#print(root)
species_names=root.split('/')
species_name=species_names[-1]
#print(species_name)
image=cv2.imread(root+'/'+f)
#seedling_df = seedling_df.append({'species_name':species_name, 'image_name':f,'image':image,'image_path':root}, ignore_index=True)
data = data.append({'image_path':root+'/'+f,'image_name':f,'species_name':species_name,'actual_img':[image]}, ignore_index=True)
return data
seedling_df=createDataFrame(train_dir)
### Test Snippet to store the image in data frame and displaying it
test_df=pd.DataFrame()
test_img=cv2.imread('unzipped/plant-seedlings-classification/train/Black-grass/0050f38b3.png')
test_df['img']=[test_img]
test_df
| img | |
|---|---|
| 0 | [[[27, 50, 80], [18, 42, 71], [36, 57, 83], [45, 66, 89], [50, 71, 92], [61, 81, 100], [68, 88, ... |
test_df.iloc[0]['img']
array([[[ 27, 50, 80],
[ 18, 42, 71],
[ 36, 57, 83],
...,
[ 58, 85, 93],
[ 64, 93, 99],
[ 48, 82, 88]],
[[ 20, 45, 77],
[ 23, 46, 78],
[ 39, 59, 86],
...,
[ 63, 85, 94],
[ 59, 82, 91],
[ 49, 75, 84]],
[[ 20, 45, 78],
[ 26, 50, 82],
[ 39, 59, 87],
...,
[ 63, 81, 92],
[ 58, 77, 89],
[ 59, 78, 90]],
...,
[[154, 147, 141],
[159, 153, 146],
[155, 149, 142],
...,
[101, 96, 94],
[ 65, 59, 64],
[ 57, 50, 56]],
[[155, 149, 142],
[156, 150, 143],
[155, 149, 141],
...,
[ 94, 88, 86],
[ 72, 66, 70],
[ 64, 58, 62]],
[[156, 149, 141],
[157, 151, 143],
[155, 149, 140],
...,
[ 97, 91, 89],
[ 72, 66, 69],
[ 60, 54, 58]]], dtype=uint8)
import matplotlib.image as image
image_det=seedling_df.iloc[500]['image_path']
print(seedling_df.iloc[500]['image_path'])
#img = image.imread(image_det)
img=test_df.iloc[0]['img']
plt.grid(False)
plt.title("Test")
new=plt.imshow(img)
plt.show()
unzipped/plant-seedlings-classification/train/Charlock/a30113dfc.png
seedling_df.shape
(4750, 4)
seedling_df.head()
| actual_img | image_name | image_path | species_name | |
|---|---|---|---|---|
| 0 | [[[[27 50 80], [18 42 71], [36 57 83], [45 66 89], [50 71 92], [ 61 81 100], [ 68 88 106], [ 6... | 0050f38b3.png | unzipped/plant-seedlings-classification/train/Black-grass/0050f38b3.png | Black-grass |
| 1 | [[[[37 43 55], [37 43 54], [40 46 57], [41 47 58], [48 53 64], [46 52 63], [47 52 64], [48 53 66... | 0183fdf68.png | unzipped/plant-seedlings-classification/train/Black-grass/0183fdf68.png | Black-grass |
| 2 | [[[[24 32 45], [21 30 44], [22 30 45], [24 31 46], [29 35 49], [25 32 44], [21 31 40], [24 34 42... | 0260cffa8.png | unzipped/plant-seedlings-classification/train/Black-grass/0260cffa8.png | Black-grass |
| 3 | [[[[ 51 84 108], [ 56 89 112], [ 54 88 110], [ 55 89 111], [ 55 89 112], [ 55 90 113], [ 5... | 05eedce4d.png | unzipped/plant-seedlings-classification/train/Black-grass/05eedce4d.png | Black-grass |
| 4 | [[[[165 162 162], [165 161 163], [160 157 158], [163 160 160], [162 161 159], [163 162 159], [16... | 075d004bc.png | unzipped/plant-seedlings-classification/train/Black-grass/075d004bc.png | Black-grass |
seedling_df.tail()
| actual_img | image_name | image_path | species_name | |
|---|---|---|---|---|
| 4745 | [[[[98 94 97], [93 88 94], [87 82 86], [82 77 80], [80 76 79], [83 80 83], [78 76 80], [73 71 78... | fc293eacb.png | unzipped/plant-seedlings-classification/train/Sugar beet/fc293eacb.png | Sugar beet |
| 4746 | [[[[35 63 92], [38 67 96], [34 64 94], [17 51 84], [ 8 43 76], [19 48 80], [31 55 85], [30 52 83... | fc441208c.png | unzipped/plant-seedlings-classification/train/Sugar beet/fc441208c.png | Sugar beet |
| 4747 | [[[[44 56 72], [52 63 75], [53 65 75], [46 59 69], [60 73 83], [48 63 74], [43 59 70], [57 71 83... | fed9406b2.png | unzipped/plant-seedlings-classification/train/Sugar beet/fed9406b2.png | Sugar beet |
| 4748 | [[[[144 141 145], [143 139 143], [146 142 146], [147 144 147], [147 145 148], [145 144 146], [14... | fef5e7066.png | unzipped/plant-seedlings-classification/train/Sugar beet/fef5e7066.png | Sugar beet |
| 4749 | [[[[71 90 99], [65 81 94], [68 83 97], [68 82 98], [ 70 84 101], [ 75 90 105], [ 78 95 109], ... | ffa401155.png | unzipped/plant-seedlings-classification/train/Sugar beet/ffa401155.png | Sugar beet |
Successfully able to create a dataframe with actual image,name of the image(image_name) image path and corresponding species_name
import matplotlib.image as image
index=0
#image_det=seedling_df.iloc[4748]['image_path']
#print(seedling_df.iloc[500]['image_path'])
#img = image.imread(image_det)
#img = seedling_df.iloc[0]['actual_img']
img=seedling_df.iloc[0]['actual_img']
plt.grid(False)
plt.title(seedling_df.iloc[0]['species_name'])
new=plt.imshow(np.squeeze(img))
plt.show()
C. Write a function that will select n random images and display images along with its species. [4 Marks]
Hint: If input for function is 5, it should print 5 random images along with its labels. 2.
import matplotlib.image as matimage
import random
## Method to display random images
def display_random_images(no_of_images,data,title_name):## accept number of random images as input, data frame and name of title column
randomlist = random.sample(range(0, data.shape[0]), no_of_images)
print(randomlist)
w = 10
h = 10
fig = plt.figure(figsize=(15, 15))
columns = 3
## Logic to find the number of grids to represent
num_rows=no_of_images//columns # Quotient will be stored in num_rows
reminder=no_of_images%columns # Reminder will be stored in Reminder
if(reminder==0):
rows=num_rows
else:
rows=num_rows+1
#rows =2
x=0
for i in randomlist:
x=x+1
fig.add_subplot(rows, columns, x)
fig.add_subplot(rows, columns, x)
species_name=data.iloc[i][title_name]
plt.title(species_name)
img=seedling_df.iloc[i]['actual_img']
plt.grid(False)
new=plt.imshow(np.squeeze(img))
plt.show()
display_random_images(5,seedling_df,'species_name')
[3566, 2019, 2370, 729, 378]
A. Create X & Y from the DataFrame. [2 Marks]
seedling_df.head()
| actual_img | image_name | image_path | species_name | |
|---|---|---|---|---|
| 0 | [[[[27 50 80], [18 42 71], [36 57 83], [45 66 89], [50 71 92], [ 61 81 100], [ 68 88 106], [ 6... | 0050f38b3.png | unzipped/plant-seedlings-classification/train/Black-grass/0050f38b3.png | Black-grass |
| 1 | [[[[37 43 55], [37 43 54], [40 46 57], [41 47 58], [48 53 64], [46 52 63], [47 52 64], [48 53 66... | 0183fdf68.png | unzipped/plant-seedlings-classification/train/Black-grass/0183fdf68.png | Black-grass |
| 2 | [[[[24 32 45], [21 30 44], [22 30 45], [24 31 46], [29 35 49], [25 32 44], [21 31 40], [24 34 42... | 0260cffa8.png | unzipped/plant-seedlings-classification/train/Black-grass/0260cffa8.png | Black-grass |
| 3 | [[[[ 51 84 108], [ 56 89 112], [ 54 88 110], [ 55 89 111], [ 55 89 112], [ 55 90 113], [ 5... | 05eedce4d.png | unzipped/plant-seedlings-classification/train/Black-grass/05eedce4d.png | Black-grass |
| 4 | [[[[165 162 162], [165 161 163], [160 157 158], [163 160 160], [162 161 159], [163 162 159], [16... | 075d004bc.png | unzipped/plant-seedlings-classification/train/Black-grass/075d004bc.png | Black-grass |
X=seedling_df.drop(labels= ["species_name"] , axis = 1)
X.head()
| actual_img | image_name | image_path | |
|---|---|---|---|
| 0 | [[[[27 50 80], [18 42 71], [36 57 83], [45 66 89], [50 71 92], [ 61 81 100], [ 68 88 106], [ 6... | 0050f38b3.png | unzipped/plant-seedlings-classification/train/Black-grass/0050f38b3.png |
| 1 | [[[[37 43 55], [37 43 54], [40 46 57], [41 47 58], [48 53 64], [46 52 63], [47 52 64], [48 53 66... | 0183fdf68.png | unzipped/plant-seedlings-classification/train/Black-grass/0183fdf68.png |
| 2 | [[[[24 32 45], [21 30 44], [22 30 45], [24 31 46], [29 35 49], [25 32 44], [21 31 40], [24 34 42... | 0260cffa8.png | unzipped/plant-seedlings-classification/train/Black-grass/0260cffa8.png |
| 3 | [[[[ 51 84 108], [ 56 89 112], [ 54 88 110], [ 55 89 111], [ 55 89 112], [ 55 90 113], [ 5... | 05eedce4d.png | unzipped/plant-seedlings-classification/train/Black-grass/05eedce4d.png |
| 4 | [[[[165 162 162], [165 161 163], [160 157 158], [163 160 160], [162 161 159], [163 162 159], [16... | 075d004bc.png | unzipped/plant-seedlings-classification/train/Black-grass/075d004bc.png |
y=seedling_df['species_name']
y.head()
0 Black-grass 1 Black-grass 2 Black-grass 3 Black-grass 4 Black-grass Name: species_name, dtype: object
B. Encode labels of the images. [2 Marks]
## Encoding the labels with the labelEncoder
encoder= LabelEncoder()
y=encoder.fit_transform(y)
y
array([ 0, 0, 0, ..., 11, 11, 11])
encoder.inverse_transform([11])
array(['Sugar beet'], dtype=object)
C. Unify shape of all the images. [2 Marks]
NUM_CLASSES=len(classes)
NUM_CLASSES
12
HEIGHT=128
WIDTH=128
CHANNELS=3
input_shape=(WIDTH, HEIGHT, DEPTH)
X.shape
(4750, 3)
def unify_image_shape(data):
X = []
# Loop through the dataframe to get all image path
for i in data.index:
imagepath =data.iloc[i]['image_path']
if(i==0):
print(imagepath)
img = load_img(imagepath)
arr = img_to_array(img) # Numpy array with shape (233,233,3)
arr = cv2.resize(arr, (HEIGHT,WIDTH)) #Numpy array with shape (HEIGHT, WIDTH,3)
X.append(arr)
return X
X=unify_image_shape(seedling_df)
unzipped/plant-seedlings-classification/train/Black-grass/0050f38b3.png
D. Normalise all the images. [2 Marks]
X[0]
array([[[ 77.51807 , 47.181885, 23.59668 ],
[ 81.575195, 54.592285, 33.248535],
[ 88.85132 , 66.33325 , 45.246094],
...,
[ 88.18091 , 78.25903 , 55.638184],
[ 93.99878 , 86.031494, 60.007324],
[ 89.57715 , 82.78027 , 52.092285]],
[[ 78.69751 , 46.112305, 21.431885],
[ 85.65747 , 57.006836, 36.235596],
[ 90.18872 , 65.856445, 45.848145],
...,
[ 87.93652 , 76.74487 , 59.463623],
[ 91.796875, 81.04126 , 62.025635],
[ 88.947266, 77.55664 , 57.29663 ]],
[[ 82.56909 , 49.10034 , 24.694092],
[ 84.1958 , 53.528076, 31.188477],
[ 94.71338 , 69.385254, 50.605713],
...,
[ 84.0105 , 72.0105 , 57.44629 ],
[ 84.75 , 72.488525, 57.613525],
[ 89.02881 , 76.78784 , 61.356934]],
...,
[[142.64722 , 148.82568 , 155.56006 ],
[143.73755 , 149.87402 , 155.87402 ],
[142.698 , 149.04175 , 154.36987 ],
...,
[131.81934 , 136.27075 , 141.927 ],
[ 91.41138 , 91.8645 , 97.12598 ],
[ 67.36426 , 61.326416, 67.30176 ]],
[[142.31543 , 148.73022 , 155.31543 ],
[142.52686 , 149.68872 , 155.68872 ],
[140.75732 , 147.49585 , 152.63232 ],
...,
[130.19897 , 134.73438 , 140.45728 ],
[ 86.85889 , 87.47827 , 92.84326 ],
[ 59.34375 , 53.96167 , 60.546875]],
[[141.72632 , 149.4607 , 156. ],
[140.82104 , 149.3523 , 155.3523 ],
[139.05762 , 147.2854 , 152.95728 ],
...,
[129.44409 , 133.44409 , 139.92188 ],
[ 84.356445, 85.286865, 91.286865],
[ 61.772705, 57.967773, 63.967773]]], dtype=float32)
X= np.array(X, dtype="float") / 255.0
X[0]
array([[[0.30399242, 0.185027 , 0.092536 ],
[0.31990273, 0.21408739, 0.13038641],
[0.34843654, 0.2601304 , 0.17743566],
...,
[0.34580748, 0.30689817, 0.21818896],
[0.36862266, 0.33737841, 0.23532284],
[0.35128294, 0.32462852, 0.20428347]],
[[0.30861769, 0.18083257, 0.08404661],
[0.33591165, 0.22355622, 0.14210038],
[0.35368126, 0.25826057, 0.17979665],
...,
[0.34484911, 0.30096029, 0.23319068],
[0.35998775, 0.31780886, 0.24323778],
[0.34881281, 0.30414369, 0.22469267]],
[[0.32380036, 0.19255036, 0.09683958],
[0.33017961, 0.20991402, 0.12230775],
[0.37142502, 0.27209903, 0.19845378],
...,
[0.32945293, 0.28239411, 0.22527956],
[0.33235294, 0.28426873, 0.22593539],
[0.34913258, 0.30112879, 0.24061543]],
...,
[[0.55940085, 0.58363013, 0.61003945],
[0.56367666, 0.58774127, 0.61127068],
[0.55959999, 0.58447744, 0.60537205],
...,
[0.51693857, 0.53439511, 0.55657648],
[0.35847599, 0.36025295, 0.38088618],
[0.26417356, 0.24049575, 0.26392846]],
[[0.55809972, 0.58325578, 0.60908012],
[0.55892884, 0.58701459, 0.610544 ],
[0.55198951, 0.5784151 , 0.59855813],
...,
[0.51058421, 0.5283701 , 0.55081284],
[0.34062309, 0.34305205, 0.36409122],
[0.23272059, 0.21161439, 0.23743873]],
[[0.55578948, 0.58612037, 0.61176471],
[0.55223939, 0.58569527, 0.60922469],
[0.54532399, 0.57758981, 0.59983245],
...,
[0.50762389, 0.52331016, 0.54871324],
[0.33080959, 0.3344583 , 0.35798771],
[0.2422459 , 0.2273246 , 0.25085401]]])
Through Normalize all the images are normalized now ready for model building
A.Split the data into train and test data. [2 Marks]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)
B. Create new CNN architecture to train the model. [4 Marks]
## method to define CNN model .It accepts height width, num_channels and num classes as parameter
def build_cnn_model(height, width, num_channels, num_classes, loss='categorical_crossentropy', metrics=['accuracy']):
model = Sequential()
model.add(Conv2D(32, (5,5), activation ='relu', input_shape = (height, width, num_channels)))
model.add(MaxPooling2D(pool_size=3))
#model.add(Dropout(0.2))
model.add(Conv2D(filters=64, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))
model.add(Conv2D(filters=128, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Conv2D(filters=128, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Flatten())
# fully connected layer
model.add(Dense(units = 500, activation = 'relu'))
model.add(Dropout(0.2))
# output layer
model.add(Dense(units = num_classes, activation = 'softmax'))
model.summary()
return model
HEIGHT=128
WIDTH=128
CHANNELS=3
## Building CNN model
cnn_seedling_model = build_cnn_model(HEIGHT, WIDTH, CHANNELS,NUM_CLASSES)
Model: "sequential_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_26 (Conv2D) (None, 124, 124, 32) 2432
max_pooling2d_26 (MaxPoolin (None, 41, 41, 32) 0
g2D)
conv2d_27 (Conv2D) (None, 41, 41, 64) 32832
max_pooling2d_27 (MaxPoolin (None, 20, 20, 64) 0
g2D)
conv2d_28 (Conv2D) (None, 20, 20, 128) 73856
max_pooling2d_28 (MaxPoolin (None, 10, 10, 128) 0
g2D)
dropout_15 (Dropout) (None, 10, 10, 128) 0
conv2d_29 (Conv2D) (None, 10, 10, 128) 65664
max_pooling2d_29 (MaxPoolin (None, 5, 5, 128) 0
g2D)
dropout_16 (Dropout) (None, 5, 5, 128) 0
flatten_8 (Flatten) (None, 3200) 0
dense_15 (Dense) (None, 500) 1600500
dropout_17 (Dropout) (None, 500) 0
dense_16 (Dense) (None, 12) 6012
=================================================================
Total params: 1,781,296
Trainable params: 1,781,296
Non-trainable params: 0
_________________________________________________________________
C. Train the model on train data and validate on test data. [2 Marks]
## Compile the model
cnn_seedling_model.compile(loss="categorical_crossentropy", metrics=["accuracy"], optimizer="adam")
### Train the model
print('Train data shape')
print('---------------')
print(X_train.shape)
print(y_train.shape)
print('Test data shape')
print('---------------')
print(X_test.shape)
print(y_test.shape)
Train data shape --------------- (3325, 128, 128, 3) (3325,) Test data shape --------------- (1425, 128, 128, 3) (1425,)
print('y_test unique classes', np.unique(y_test))
y_test unique classes [ 0 1 2 3 4 5 6 7 8 9 10 11]
y_train=to_categorical(y_train)
y_test=to_categorical(y_test)
def fit_model(model,X_train,y_train,X_val,y_val,epochs,batch_size):
history = model.fit(X_train, y_train,epochs=epochs,batch_size=batch_size,validation_data=(X_val, y_val))
return history
epochs=25
batchsize=100
cnn_seed_history = cnn_seedling_model.fit(X_train, y_train,epochs=epochs,batch_size=100,validation_data=(X_test, y_test))
Epoch 1/25 34/34 [==============================] - 108s 3s/step - loss: 2.3966 - accuracy: 0.1600 - val_loss: 2.1957 - val_accuracy: 0.2632 Epoch 2/25 34/34 [==============================] - 93s 3s/step - loss: 1.9051 - accuracy: 0.3489 - val_loss: 1.5968 - val_accuracy: 0.4554 Epoch 3/25 34/34 [==============================] - 91s 3s/step - loss: 1.6084 - accuracy: 0.4283 - val_loss: 1.4270 - val_accuracy: 0.5004 Epoch 4/25 34/34 [==============================] - 93s 3s/step - loss: 1.3767 - accuracy: 0.5095 - val_loss: 1.2112 - val_accuracy: 0.5846 Epoch 5/25 34/34 [==============================] - 93s 3s/step - loss: 1.1934 - accuracy: 0.5817 - val_loss: 1.0341 - val_accuracy: 0.6281 Epoch 6/25 34/34 [==============================] - 93s 3s/step - loss: 1.0099 - accuracy: 0.6457 - val_loss: 0.9274 - val_accuracy: 0.6933 Epoch 7/25 34/34 [==============================] - 96s 3s/step - loss: 0.8956 - accuracy: 0.6911 - val_loss: 0.9418 - val_accuracy: 0.6611 Epoch 8/25 34/34 [==============================] - 93s 3s/step - loss: 0.8298 - accuracy: 0.7080 - val_loss: 0.7532 - val_accuracy: 0.7411 Epoch 9/25 34/34 [==============================] - 249s 7s/step - loss: 0.7116 - accuracy: 0.7462 - val_loss: 0.7513 - val_accuracy: 0.7361 Epoch 10/25 34/34 [==============================] - 279s 8s/step - loss: 0.6533 - accuracy: 0.7714 - val_loss: 0.6650 - val_accuracy: 0.7761 Epoch 11/25 34/34 [==============================] - 1230s 37s/step - loss: 0.5863 - accuracy: 0.8033 - val_loss: 0.6734 - val_accuracy: 0.7782 Epoch 12/25 34/34 [==============================] - 226s 7s/step - loss: 0.5401 - accuracy: 0.8105 - val_loss: 0.6726 - val_accuracy: 0.7775 Epoch 13/25 34/34 [==============================] - 97s 3s/step - loss: 0.4809 - accuracy: 0.8289 - val_loss: 0.5973 - val_accuracy: 0.8021 Epoch 14/25 34/34 [==============================] - 100s 3s/step - loss: 0.4718 - accuracy: 0.8352 - val_loss: 0.6996 - val_accuracy: 0.7642 Epoch 15/25 34/34 [==============================] - 94s 3s/step - loss: 0.4541 - accuracy: 0.8400 - val_loss: 0.6734 - val_accuracy: 0.7818 Epoch 16/25 34/34 [==============================] - 121s 4s/step - loss: 0.4063 - accuracy: 0.8565 - val_loss: 0.5616 - val_accuracy: 0.8119 Epoch 17/25 34/34 [==============================] - 95s 3s/step - loss: 0.3474 - accuracy: 0.8722 - val_loss: 0.5522 - val_accuracy: 0.8211 Epoch 18/25 34/34 [==============================] - 97s 3s/step - loss: 0.3655 - accuracy: 0.8671 - val_loss: 0.5557 - val_accuracy: 0.8246 Epoch 19/25 34/34 [==============================] - 97s 3s/step - loss: 0.2951 - accuracy: 0.8950 - val_loss: 0.5582 - val_accuracy: 0.8232 Epoch 20/25 34/34 [==============================] - 101s 3s/step - loss: 0.2752 - accuracy: 0.8998 - val_loss: 0.5995 - val_accuracy: 0.8140 Epoch 21/25 34/34 [==============================] - 95s 3s/step - loss: 0.2554 - accuracy: 0.9080 - val_loss: 0.5525 - val_accuracy: 0.8379 Epoch 22/25 34/34 [==============================] - 95s 3s/step - loss: 0.2185 - accuracy: 0.9212 - val_loss: 0.6129 - val_accuracy: 0.8084 Epoch 23/25 34/34 [==============================] - 94s 3s/step - loss: 0.2031 - accuracy: 0.9224 - val_loss: 0.5968 - val_accuracy: 0.8189 Epoch 24/25 34/34 [==============================] - 95s 3s/step - loss: 0.2275 - accuracy: 0.9110 - val_loss: 0.6058 - val_accuracy: 0.8175 Epoch 25/25 34/34 [==============================] - 116s 3s/step - loss: 0.2123 - accuracy: 0.9155 - val_loss: 0.6964 - val_accuracy: 0.8021
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
def plot_accuracy_loss(history):
accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(accuracy)) # Get number of epochs
plt.plot ( epochs, accuracy, label = 'training accuracy' )
plt.plot ( epochs, val_accuracy, label = 'validation accuracy' )
plt.title ('Training and validation accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(loc = 'lower right')
plt.figure()
plt.plot ( epochs, loss, label = 'training loss' )
plt.plot ( epochs, val_loss, label = 'validation loss' )
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(loc = 'upper right')
plt.title ('Training and validation loss')
plot_accuracy_loss(cnn_seed_history)
The graph shows good accuracy improves the training and test accuracy. The training is done for 25 epochs the training accuracy reached almost 91.55% Where as test accuracy reached 80%
# Final evaluation of the model
def evaluate_model(model,X_test,y_test):
scores = model.evaluate(X_test, y_test, verbose=0)
print("Loss:", scores[0])
print("Accuracy:", scores[1])
## Test Loss and Test Accuracy Details
evaluate_model(cnn_seedling_model,X_test,y_test)
Loss: 0.6964271664619446 Accuracy: 0.8021052479743958
scores = cnn_seedling_model.evaluate(X_test, y_test, verbose=0)
scores
[0.6964271664619446, 0.8021052479743958]
## Test and Training accuracy of the model
cnn_train_acc=cnn_seed_history.history['accuracy'][-1]
cnn_test_acc=cnn_seed_history.history['val_accuracy'][-1]
cnn_train_loss=cnn_seed_history.history['loss'][-1]
cnn_test_loss=cnn_seed_history.history['val_loss'][-1]
print('Training accuracy',cnn_train_acc,'cnn_test_acc',cnn_test_acc,'train_loss',cnn_train_loss,'test_loss',cnn_test_loss)
Training accuracy 0.9154887199401855 cnn_test_acc 0.8021052479743958 train_loss 0.2122935801744461 test_loss 0.6964271664619446
#predict on test
y_predict = cnn_seedling_model.predict(X_test)
from sklearn.metrics import confusion_matrix
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
len(classes)
12
import itertools
plt.subplots(figsize=(22,7)) #set the size of the plot
# Predict the values from the validation dataset
#y_pred = cnn_model.predict(X_test)
# Convert predictions classes to one hot vectors
y_pred_classes = np.argmax(y_predict,axis = 1)
# Convert validation observations to one hot vectors
y_true = np.argmax(y_test,axis = 1)
# compute the confusion matrix
confusion_mtx = confusion_matrix(y_true, y_pred_classes)
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(12))
As per the confusion matrix the model can predict well. Misclassification happens for class 6.
D. Select a random image and print actual label and predicted label for the same. [2 Marks]
seedling_df.shape
(4750, 3)
import random
random_number = random. randint(0,seedling_df.shape[0])
print(random_number)
808
import matplotlib.image as image
image_det=seedling_df.iloc[random_number]['image_path']
print(seedling_df.iloc[random_number]['image_path'])
img = image.imread(image_det)
plt.grid(False)
plt.title(seedling_df.iloc[random_number]['species_name'])
new=plt.imshow(img)
plt.show()
unzipped/plant-seedlings-classification/train/Cleavers/85b23f3e6.png
def predict_seedling_image(img, cnn_seedling_model):
img_width=128
img_height=128
img = cv2.resize(img, (img_width, img_height), interpolation = cv2.INTER_CUBIC)
img = np.reshape(img, (1, img_width, img_height, 3))
img = img/255.
pred = cnn_seedling_model.predict(img)
print(pred)
class_num = np.argmax(pred)
print(class_num)
return class_num, np.max(pred)
import cv2
print(image_det)
input_image = cv2.imread(image_det)
input_image_height, input_image_width, input_image_channels = input_image.shape
unzipped/plant-seedlings-classification/train/Cleavers/85b23f3e6.png
input_image.shape
(382, 382, 3)
cnn_predict_class, cnn_pred_proba=predict_seedling_image(input_image,cnn_seedling_model)
[[7.2001555e-08 6.6989199e-03 9.8549485e-01 1.3310973e-08 2.4912137e-04 7.5408649e-03 1.5265911e-08 2.8784050e-06 1.1044883e-06 1.7099243e-06 3.0365015e-06 7.3305305e-06]] 2
cnn_predict_class
2
print("Predicted class by using the Built model is ",cnn_predict_class ,"with the probability of ",cnn_pred_proba )
Predicted class by using the Built model is 2 with the probability of 0.98549485
cnn_pred_proba
0.98549485
encoder.inverse_transform([cnn_predict_class])
array(['Cleavers'], dtype=object)
DOMAIN: Botanical Research
CONTEXT:University X is currently undergoing some research involving understanding the characteristics of flowers. They already have
have invested on curating sample images. They require an automation which can create a classifier capable of determining a flower’s
species from a photo.
• DATA DESCRIPTION: The dataset comprises of images from 17 plant species.
• PROJECT OBJECTIVE: :To experiment with various approaches to train an image classifier to predict type of flower from the image.
1. Import and Understand the data [5 Marks]
A.Import and read oxflower17 dataset from tflearn and split into X and Y while loading. [2 Marks]
import tflearn
import tflearn.datasets.oxflower17 as oxflower17
WARNING:tensorflow:From C:\Users\HP\anaconda3\lib\site-packages\tensorflow\python\compat\v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import collections
import seaborn as sns
import pandas as pd
from sklearn.metrics import mean_squared_error, confusion_matrix, classification_report, roc_curve, precision_recall_curve, roc_auc_score, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense, Activation, LeakyReLU
from keras import optimizers
# Import label encoder
from sklearn import preprocessing
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout
from keras.layers import Dropout, InputLayer, BatchNormalization
from tensorflow.keras import layers
# Ignore warnings
import warnings
warnings.filterwarnings("ignore")
#Load the dataset and split into X and y
X, y = oxflower17.load_data()
Comments:
The data is loaded from oxflower data set and split into X and y successfully.
B. Print Number of images and shape of the images. [1 Marks ]
X.shape, y.shape
((1360, 224, 224, 3), (1360,))
print('Number of Images',X.shape[0])
Number of Images 1360
print('Shape of the Images',X.shape[1],X.shape[2],X.shape[3])
Shape of the Images 224 224 3
print(type(X))
print(type(y))
<class 'numpy.ndarray'> <class 'numpy.ndarray'>
Observation
## to have a look at first image
index=5
X[index]
array([[[0.12156863, 0.16078432, 0.02745098],
[0.1254902 , 0.16470589, 0.03137255],
[0.12941177, 0.16862746, 0.03529412],
...,
[0.21568628, 0.2627451 , 0.03137255],
[0.17254902, 0.20784314, 0.02352941],
[0.14901961, 0.1764706 , 0.03529412]],
[[0.13333334, 0.17254902, 0.04313726],
[0.13333334, 0.16862746, 0.03921569],
[0.13333334, 0.17254902, 0.04313726],
...,
[0.1882353 , 0.23137255, 0.00392157],
[0.16470589, 0.2 , 0.01960784],
[0.15686275, 0.18431373, 0.04313726]],
[[0.11764706, 0.15294118, 0.03137255],
[0.12941177, 0.16470589, 0.04313726],
[0.13333334, 0.16862746, 0.04705882],
...,
[0.16470589, 0.20784314, 0. ],
[0.14901961, 0.1882353 , 0.01176471],
[0.15294118, 0.1882353 , 0.02745098]],
...,
[[0.0627451 , 0.09803922, 0.01176471],
[0.05882353, 0.09803922, 0.00392157],
[0.06666667, 0.10588235, 0.00784314],
...,
[0.07450981, 0.10980392, 0.01176471],
[0.08235294, 0.11372549, 0.01960784],
[0.07843138, 0.11372549, 0.01568628]],
[[0.07450981, 0.11764706, 0.00392157],
[0.07450981, 0.11764706, 0.00392157],
[0.07843138, 0.11764706, 0.00784314],
...,
[0.08235294, 0.09803922, 0.01568628],
[0.08235294, 0.09803922, 0.01960784],
[0.09411765, 0.10980392, 0.03137255]],
[[0.07450981, 0.11764706, 0. ],
[0.07843138, 0.12156863, 0.00392157],
[0.07843138, 0.12156863, 0.00784314],
...,
[0.08235294, 0.09411765, 0.01960784],
[0.08235294, 0.09411765, 0.01960784],
[0.09411765, 0.10196079, 0.03137255]]], dtype=float32)
plt.grid(False)
img=plt.imshow(X[index])
Image shape of 224*224 is available
## To get the image label
print('The image label is',y[index])
The image label is 4
C. Print count of each class from y. [2 Marks]
np.unique(y)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
Observation
collections.Counter(y)
Counter({4: 80,
13: 80,
9: 80,
0: 80,
3: 80,
12: 80,
8: 80,
6: 80,
10: 80,
14: 80,
1: 80,
11: 80,
7: 80,
16: 80,
15: 80,
2: 80,
5: 80})
unique, counts = np.unique(y, return_counts=True)
dict(zip(unique, counts))
{0: 80,
1: 80,
2: 80,
3: 80,
4: 80,
5: 80,
6: 80,
7: 80,
8: 80,
9: 80,
10: 80,
11: 80,
12: 80,
13: 80,
14: 80,
15: 80,
16: 80}
unique, counts = np.unique(y, return_counts=True)
class_dict=dict(zip(unique, counts))
class_dict
{0: 80,
1: 80,
2: 80,
3: 80,
4: 80,
5: 80,
6: 80,
7: 80,
8: 80,
9: 80,
10: 80,
11: 80,
12: 80,
13: 80,
14: 80,
15: 80,
16: 80}
# Print the names of the columns.
print ("{:<10} {:<10}".format('Class Name', 'Count'))
# print each data item.
for key, value in class_dict.items():
print ("{:<10} {:<10}".format(key, value))
Class Name Count 0 80 1 80 2 80 3 80 4 80 5 80 6 80 7 80 8 80 9 80 10 80 11 80 12 80 13 80 14 80 15 80 16 80
#plotting the graph
sns.countplot(y)
plt.show()
The individual class count is printed and also count plot is shown to display the counts. Every class from 0 to 16 has count of 80.
2. Image Exploration & Transformation [Learning purpose - Not related to final model] [10 Marks]
plt.imshow(X[50])
plt.grid(False)
plt.title("Displaying a random image", fontsize=14)
Text(0.5, 1.0, 'Displaying a random image')
import random
n = random.randint(0,1360)
print(n)
1223
A. Display 5 random images. [1 Marks]
## method to display 5 random images
def display_random_images(no_of_images):
randomlist = random.sample(range(0, 1360), no_of_images)
print(randomlist)
w = 10
h = 10
fig = plt.figure(figsize=(15, 15))
columns = 3
## Logic to find the number of grids to represent
num_rows=no_of_images//columns # Quotient will be stored in num_rows
reminder=no_of_images%columns # Reminder will be stored in Reminder
if(reminder==0):
rows=num_rows
else:
rows=num_rows+1
x=0
for i in randomlist:
x=x+1
fig.add_subplot(rows, columns, x)
plt.title('image at postion '+str(i) +' label '+str(y[i]))
plt.grid(False)
plt.imshow(X[i])
plt.show()
##Invoking the Display method to display 5 images
display_random_images(5)
[851, 1155, 134, 1219, 627]
Comments
The 5 random images are displayed .
display_random_images(9)
[694, 1176, 390, 1070, 1155, 669, 97, 1346, 1184]
B. Select any image from the dataset and assign it to a variable. [1 Marks]
sample_img = X[118]
Comments
An image is stored in a variable
C. Transform the image into grayscale format and display the same. [3 Marks]
plt.style.use('default')
plt.imshow(sample_img)
#plt.grid('False')
plt.title("Sample image", fontsize=14)
Text(0.5, 1.0, 'Sample image')
##Method to convert Gray scale
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.2989, 0.5870, 0.1140])
gray = rgb2gray(sample_img)
plt.imshow(gray, cmap=plt.get_cmap('gray'), vmin=0, vmax=1)
#plt.grid(False)
plt.show()
D. Apply a filter to sharpen the image and display the image before and after sharpening. [2 Marks]
### Before Sharpening
plt.imshow(sample_img)
#plt.grid(False)
plt.show()
##Method to Sharpen the image by applying filter
kernel = np.array([[-1,-1,-1],
[-1, 9,-1],
[-1,-1,-1]])
sharpened = cv2.filter2D(sample_img, -1, kernel) # applying the sharpening kernel to the input image & displaying it.
cv2.imshow('Image Sharpening', sharpened)
plt.imshow(sharpened, cmap=plt.get_cmap('gray'), vmin=0, vmax=1)
plt.grid(False)
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
E. Apply a filter to blur the image and display the image before and after blur. [2 Mark]
image_blurred = cv2.blur(src=sample_img, ksize=(5, 5))
cv2.imshow('Blurred Image', image_blurred)
#plt.grid(False)
plt.imshow(image_blurred)
<matplotlib.image.AxesImage at 0x18c63542430>
F. Display all the 4 images from above questions besides each other to observe the difference. [1 Marks
# Loop Over Two Lists to Create a Dictionary
keys = ['Original Image', 'Grayed Image', 'Sharpened Image','Blurred Image']
image_list=[sample_img,gray,sharpened,image_blurred]
#values = [32, 31, 30]
dictionary = {}
for i in range(len(keys)):
dictionary[keys[i]] = image_list[i]
#print(dictionary)
x=0
rows=2
columns=2
w = 10
h = 10
fig = plt.figure(figsize=(14, 14))
for key in dictionary:
#img = np.random.randint(10, size=(h,w))
x=x+1
fig.add_subplot(rows, columns, x)
plt.title(key)
plt.grid(False)
if(key=='Grayed Image'):
#plt.imshow(gray, cmap=plt.get_cmap('gray'), vmin=0, vmax=1)
plt.imshow(dictionary[key],cmap=plt.get_cmap('gray'),)
else:
plt.imshow(dictionary[key])
#plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Observation:
3. Model training and Tuning: [15 Marks]
A. Split the data into train and test with 80:20 proportion. [2 Marks]
from sklearn.model_selection import train_test_split
# Split X and y into training and test set in 80:20 ratio
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
print('Train data shape')
print('---------------')
print(X_train.shape)
print(y_train.shape)
print('Test data shape')
print('---------------')
print(X_test.shape)
print(y_test.shape)
Train data shape --------------- (1088, 224, 224, 3) (1088,) Test data shape --------------- (272, 224, 224, 3) (272,)
3.Train a model using any Supervised Learning algorithm and share performance metrics on test data. [3 Marks]
X_train[0]
array([[[0.21960784, 0.34901962, 0.16862746],
[0.21176471, 0.3372549 , 0.15686275],
[0.19607843, 0.3137255 , 0.11372549],
...,
[0.0627451 , 0.16078432, 0.01176471],
[0.10980392, 0.2 , 0.0627451 ],
[0.18039216, 0.2627451 , 0.13725491]],
[[0.24705882, 0.37254903, 0.20392157],
[0.23137255, 0.34901962, 0.18039216],
[0.21176471, 0.32156864, 0.13725491],
...,
[0.08235294, 0.18431373, 0.03529412],
[0.13333334, 0.23529412, 0.09411765],
[0.21568628, 0.3137255 , 0.18039216]],
[[0.27058825, 0.3764706 , 0.22745098],
[0.23137255, 0.34117648, 0.1882353 ],
[0.2 , 0.30588236, 0.13725491],
...,
[0.09019608, 0.2 , 0.05490196],
[0.16078432, 0.26666668, 0.12941177],
[0.21176471, 0.32156864, 0.1882353 ]],
...,
[[0.30588236, 0.38039216, 0.27450982],
[0.24705882, 0.31764707, 0.21960784],
[0.22352941, 0.29411766, 0.20784314],
...,
[0.38431373, 0.48235294, 0.15686275],
[0.63529414, 0.6666667 , 0.1882353 ],
[0.9019608 , 0.8352941 , 0.16078432]],
[[0.2627451 , 0.34509805, 0.22352941],
[0.21960784, 0.3019608 , 0.19607843],
[0.21176471, 0.29803923, 0.21176471],
...,
[0.4 , 0.42745098, 0.12941177],
[0.68235296, 0.68235296, 0.10588235],
[0.95686275, 0.8627451 , 0.08627451]],
[[0.24705882, 0.32941177, 0.20392157],
[0.21960784, 0.30980393, 0.2 ],
[0.21176471, 0.29803923, 0.21176471],
...,
[0.36862746, 0.38431373, 0.09411765],
[0.7058824 , 0.7019608 , 0.09803922],
[0.99215686, 0.8862745 , 0.08627451]]], dtype=float32)
nsamples, nx, ny, channels= X_train.shape
print(nsamples,nx,ny,channels)
1088 224 224 3
Dim_2_X_train = X_train.reshape((nsamples,nx*ny*channels))
Dim_2_X_test = X_test.reshape(X_test.shape[0],nx*ny*channels)
seed=0
from sklearn.linear_model import LogisticRegression
# use logistic regression as the model
print ("[INFO] creating model...")
model = LogisticRegression(random_state=seed)
model.fit(Dim_2_X_train, y_train)
[INFO] creating model...
LogisticRegression(random_state=0)
#predict on test
y_predict = model.predict(Dim_2_X_test)
model_score_train= model.score(Dim_2_X_train, y_train)
print(model_score_train)
1.0
model_score = model.score(Dim_2_X_test, y_test)
print(model_score)
0.5110294117647058
## Display the confusion matrix
# display the confusion matrix
print ("[INFO] confusion matrix")
# get the list of training lables
#labels = unique
# plot the confusion matrix
cm = confusion_matrix(y_test, y_predict)
sns.heatmap(cm,
annot=True,
cmap="Set2")
plt.show()
[INFO] confusion matrix
print(classification_report(y_predict,y_test))
precision recall f1-score support
0 0.50 0.41 0.45 17
1 0.30 0.32 0.31 22
2 0.75 0.50 0.60 18
3 0.58 0.58 0.58 12
4 0.62 0.38 0.48 13
5 0.43 0.60 0.50 15
6 0.53 0.42 0.47 19
7 0.53 0.42 0.47 19
8 0.60 0.56 0.58 16
9 0.53 0.56 0.55 16
10 0.68 0.57 0.62 23
11 0.33 0.60 0.43 10
12 0.68 0.71 0.70 21
13 0.50 0.67 0.57 9
14 0.07 0.08 0.07 13
15 0.56 0.53 0.54 19
16 0.62 1.00 0.77 10
accuracy 0.51 272
macro avg 0.52 0.52 0.51 272
weighted avg 0.53 0.51 0.51 272
model_performance_flower_cl=pd.DataFrame(columns=['Model', 'Accuracy', 'Test Accuracy','Loss','Test Loss'])
model_performance_flower_cl=model_performance_flower_cl.append({'Model':'Logistic Regression',
'Accuracy': model_score_train,
'Test Accuracy':model_score,
'Loss': 'NA',
'Test Loss':'NA'
}, ignore_index=True)
model_performance_flower_cl
| Model | Accuracy | Test Accuracy | Loss | Test Loss | |
|---|---|---|---|---|---|
| 0 | Logistic Regression | 1.0 | 0.511029 | NA | NA |
from sklearn.neighbors import KNeighborsClassifier
# KNN Model after scaling the features without hyperparameter tuning
knn = KNeighborsClassifier(n_neighbors = 5)
knn.fit(Dim_2_X_train,y_train)
print('k-Nearest Neighbor Classifier Scores after Scaling without Hyperparameter Tuning\n\n')
print('k-NN accuracy for train set: {0:.3f}'.format(knn.score(Dim_2_X_train, y_train)))
print('k-NN accuracy for test set: {0:.3f}'.format(knn.score(Dim_2_X_test, y_test)))
y_true, y_pred = y_test, knn.predict(Dim_2_X_test)
# Classification Report
print('\n{}'.format(classification_report(y_true, y_pred)))
# Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
print('\nConfusion Matrix:\n', cm)
# Accuracy Score
auc = accuracy_score(y_true, y_pred)
print('\nAccuracy Score:\n', auc.round(3))
k-Nearest Neighbor Classifier Scores after Scaling without Hyperparameter Tuning
k-NN accuracy for train set: 0.554
k-NN accuracy for test set: 0.320
precision recall f1-score support
0 0.26 0.36 0.30 14
1 0.00 0.00 0.00 23
2 0.57 0.67 0.62 12
3 0.33 0.08 0.13 12
4 0.28 0.88 0.42 8
5 0.10 0.19 0.13 21
6 0.26 0.40 0.32 15
7 0.75 0.40 0.52 15
8 0.50 0.20 0.29 15
9 0.24 0.41 0.30 17
10 0.41 0.37 0.39 19
11 0.00 0.00 0.00 18
12 0.77 0.77 0.77 22
13 0.50 0.08 0.14 12
14 0.12 0.20 0.15 15
15 0.26 0.50 0.35 18
16 1.00 0.19 0.32 16
accuracy 0.32 272
macro avg 0.37 0.34 0.30 272
weighted avg 0.36 0.32 0.30 272
Confusion Matrix:
[[ 5 0 0 0 2 4 1 0 0 0 0 0 0 0 2 0 0]
[ 2 0 2 0 0 6 0 0 2 3 3 0 0 0 1 4 0]
[ 1 0 8 0 0 1 0 1 0 0 0 0 0 0 0 1 0]
[ 1 0 0 1 3 4 2 0 0 0 0 0 0 0 1 0 0]
[ 0 0 0 0 7 0 1 0 0 0 0 0 0 0 0 0 0]
[ 1 0 0 0 0 4 1 0 0 7 1 0 0 0 5 2 0]
[ 1 0 0 0 4 0 6 0 0 1 0 0 0 0 3 0 0]
[ 0 0 3 0 0 3 0 6 0 0 0 0 1 0 1 1 0]
[ 0 0 0 1 1 1 2 1 3 1 2 0 0 0 0 3 0]
[ 0 0 0 0 0 4 2 0 1 7 0 0 0 0 2 1 0]
[ 1 1 0 0 0 1 2 0 0 4 7 0 0 0 0 3 0]
[ 0 0 1 0 1 2 0 0 0 3 3 0 1 0 1 6 0]
[ 1 0 0 0 1 0 0 0 0 1 0 0 17 0 0 2 0]
[ 1 0 0 0 1 3 1 0 0 1 0 0 0 1 3 1 0]
[ 1 0 0 1 2 5 1 0 0 1 0 0 0 0 3 1 0]
[ 2 0 0 0 0 2 0 0 0 0 1 0 1 0 3 9 0]
[ 2 0 0 0 3 0 4 0 0 0 0 0 2 1 1 0 3]]
Accuracy Score:
0.32
train_score=knn.score(Dim_2_X_train, y_train)
train_score
0.5542279411764706
# display the confusion matrix
print ("[INFO] confusion matrix for KNN model")
# get the list of training lables
#labels = unique
# plot the confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm,
annot=True,
cmap="Set2")
plt.show()
[INFO] confusion matrix for KNN model
model_performance_flower_cl=model_performance_flower_cl.append({'Model':'KNeighborsClassifier',
'Accuracy': train_score,
'Test Accuracy':auc.round(3),
'Loss': 'NA',
'Test Loss':'NA'
}, ignore_index=True)
model_performance_flower_cl
| Model | Accuracy | Test Accuracy | Loss | Test Loss | |
|---|---|---|---|---|---|
| 0 | Logistic Regression | 1.000000 | 0.511029 | NA | NA |
| 1 | KNeighborsClassifier | 0.554228 | 0.320000 | NA | NA |
#Import svm model
from sklearn import svm
#Create a svm Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel
#Train the model using the training sets
clf.fit(Dim_2_X_train, y_train)
#Predict the response for test dataset
y_pred = clf.predict(Dim_2_X_test)
## metrics
model_score = clf.score(Dim_2_X_test, y_test)
print(model_score)
0.47058823529411764
model_train_score = clf.score(Dim_2_X_train, y_train)
print(model_train_score)
1.0
print(classification_report(y_pred,y_test))
precision recall f1-score support
0 0.50 0.58 0.54 12
1 0.26 0.26 0.26 23
2 0.75 0.47 0.58 19
3 0.58 0.58 0.58 12
4 0.62 0.42 0.50 12
5 0.33 0.35 0.34 20
6 0.33 0.42 0.37 12
7 0.47 0.44 0.45 16
8 0.60 0.53 0.56 17
9 0.47 0.57 0.52 14
10 0.63 0.55 0.59 22
11 0.33 0.55 0.41 11
12 0.73 0.80 0.76 20
13 0.33 0.36 0.35 11
14 0.20 0.12 0.15 24
15 0.44 0.44 0.44 18
16 0.56 1.00 0.72 9
accuracy 0.47 272
macro avg 0.48 0.50 0.48 272
weighted avg 0.48 0.47 0.46 272
# display the confusion matrix
print ("[INFO] confusion matrix for SVM model")
# get the list of training lables
#labels = unique
# plot the confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm,
annot=True,
cmap="Set2")
plt.show()
[INFO] confusion matrix for SVM model
model_performance_flower_cl=model_performance_flower_cl.append({'Model':'Support Vector Classifier',
'Accuracy': model_train_score,
'Test Accuracy':model_score,
'Loss': 'NA',
'Test Loss':'NA'
}, ignore_index=True)
model_performance_flower_cl
| Model | Accuracy | Test Accuracy | Loss | Test Loss | |
|---|---|---|---|---|---|
| 0 | Logistic Regression | 1.000000 | 0.511029 | NA | NA |
| 1 | KNeighborsClassifier | 0.554228 | 0.320000 | NA | NA |
| 2 | Support Vector Classifier | 1.000000 | 0.470588 | NA | NA |
Comments and Observation:
C. Train a model using Neural Network and share performance metrics on test data. [4 Marks]
Sequential model in tensorflow.keras expects data to be in the format (n_e, n_h, n_w, n_c)
n_e= number of examples, n_h = height, n_w = width, n_c = number of channels
do not reshape labels [the following will be used for Neural network models
print('Train data shape')
print('---------------')
print(X_train.shape)
print(y_train.shape)
print('Test data shape')
print('---------------')
print(X_test.shape)
print(y_test.shape)
Train data shape --------------- (1088, 224, 224, 3) (1088,) Test data shape --------------- (272, 224, 224, 3) (272,)
#X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
#X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)
y_train_one_hot=to_categorical(y_train)
y_test_one_hot=to_categorical(y_test)
print('y_train Shape', y_train.shape)
#print('y_val Shape', y_val.shape)
print('y_test Shape',y_test.shape)
y_train Shape (1088,) y_test Shape (272,)
print('y_train Shape', y_train_one_hot.shape)
#print('y_val Shape', y_val.shape)
print('y_test Shape',y_test_one_hot.shape)
y_train Shape (1088, 17) y_test Shape (272, 17)
print(y_train_one_hot)
[[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 0.] [1. 0. 0. ... 0. 0. 0.] [0. 0. 1. ... 0. 0. 0.]]
print('The one hot label is', y_train_one_hot[index])
X_train.max()
1.0
X_train.min()
0.0
output_classes=len(np.unique(y_train))
output_classes
17
def create_ann_model(height, width, num_channels, num_classes, loss='categorical_crossentropy', metrics=['accuracy']):
# batch_size = None
model = Sequential()
model.add(InputLayer(input_shape=(height, width, num_channels)))
model.add(Flatten())
model.add(BatchNormalization())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(num_classes, activation = "softmax"))
opt = tf.keras.optimizers.Adam(lr=0.000001)
model.compile(optimizer = opt, loss = loss, metrics = metrics)
model.summary()
return model
height, width, num_channels=X_train.shape[1:]
ann_model = create_ann_model(height, width, num_channels,output_classes)
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten_1 (Flatten) (None, 150528) 0
batch_normalization_4 (Batc (None, 150528) 602112
hNormalization)
dense_4 (Dense) (None, 1024) 154141696
dropout_3 (Dropout) (None, 1024) 0
batch_normalization_5 (Batc (None, 1024) 4096
hNormalization)
dense_5 (Dense) (None, 512) 524800
dropout_4 (Dropout) (None, 512) 0
batch_normalization_6 (Batc (None, 512) 2048
hNormalization)
dense_6 (Dense) (None, 256) 131328
dropout_5 (Dropout) (None, 256) 0
batch_normalization_7 (Batc (None, 256) 1024
hNormalization)
dense_7 (Dense) (None, 17) 4369
=================================================================
Total params: 155,411,473
Trainable params: 155,106,833
Non-trainable params: 304,640
_________________________________________________________________
# Create optimizer with default learning rate
# Compile the model
## Categorical cross_entrophy is used as multiclass classificaton problem
ann_model.compile(loss="categorical_crossentropy", metrics=["accuracy"], optimizer="adam")
history = ann_model.fit(X_train,
y_train_one_hot,
epochs = 50,
validation_data = (X_test,y_test_one_hot),
batch_size = 100)
Train on 1088 samples, validate on 272 samples Epoch 1/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 2.8034 - acc: 0.2059 - val_loss: 10.1195 - val_acc: 0.1103 Epoch 2/50 1088/1088 [==============================] - 19s 17ms/sample - loss: 2.0298 - acc: 0.3548 - val_loss: 5.6260 - val_acc: 0.1250 Epoch 3/50 1088/1088 [==============================] - 21s 19ms/sample - loss: 1.8022 - acc: 0.4108 - val_loss: 3.2728 - val_acc: 0.1544 Epoch 4/50 1088/1088 [==============================] - 18s 17ms/sample - loss: 1.5742 - acc: 0.4835 - val_loss: 2.8230 - val_acc: 0.1912 Epoch 5/50 1088/1088 [==============================] - 18s 16ms/sample - loss: 1.3657 - acc: 0.5395 - val_loss: 2.5222 - val_acc: 0.2169 Epoch 6/50 1088/1088 [==============================] - 16s 14ms/sample - loss: 1.1916 - acc: 0.6057 - val_loss: 2.3745 - val_acc: 0.2279 Epoch 7/50 1088/1088 [==============================] - 15s 14ms/sample - loss: 1.0074 - acc: 0.6792 - val_loss: 2.1511 - val_acc: 0.2794 Epoch 8/50 1088/1088 [==============================] - 16s 14ms/sample - loss: 0.8378 - acc: 0.7371 - val_loss: 2.0908 - val_acc: 0.3088 Epoch 9/50 1088/1088 [==============================] - 15s 14ms/sample - loss: 0.6893 - acc: 0.7923 - val_loss: 1.9889 - val_acc: 0.3382 Epoch 10/50 1088/1088 [==============================] - 15s 13ms/sample - loss: 0.5417 - acc: 0.8263 - val_loss: 1.9503 - val_acc: 0.3529 Epoch 11/50 1088/1088 [==============================] - 14s 13ms/sample - loss: 0.4510 - acc: 0.8741 - val_loss: 1.8997 - val_acc: 0.3566 Epoch 12/50 1088/1088 [==============================] - 15s 13ms/sample - loss: 0.4141 - acc: 0.8741 - val_loss: 1.9316 - val_acc: 0.3529 Epoch 13/50 1088/1088 [==============================] - 15s 13ms/sample - loss: 0.3487 - acc: 0.8998 - val_loss: 1.8159 - val_acc: 0.3419 Epoch 14/50 1088/1088 [==============================] - 16s 15ms/sample - loss: 0.2672 - acc: 0.9246 - val_loss: 1.8017 - val_acc: 0.3971 Epoch 15/50 1088/1088 [==============================] - 16s 15ms/sample - loss: 0.2217 - acc: 0.9412 - val_loss: 1.8107 - val_acc: 0.4228 Epoch 16/50 1088/1088 [==============================] - 14s 13ms/sample - loss: 0.1655 - acc: 0.9596 - val_loss: 1.7641 - val_acc: 0.4375 Epoch 17/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.1285 - acc: 0.9752 - val_loss: 1.8117 - val_acc: 0.4338 Epoch 18/50 1088/1088 [==============================] - 14s 13ms/sample - loss: 0.1180 - acc: 0.9761 - val_loss: 1.7551 - val_acc: 0.4412 Epoch 19/50 1088/1088 [==============================] - 14s 13ms/sample - loss: 0.1145 - acc: 0.9770 - val_loss: 1.7829 - val_acc: 0.4706 Epoch 20/50 1088/1088 [==============================] - 14s 12ms/sample - loss: 0.0892 - acc: 0.9789 - val_loss: 1.7201 - val_acc: 0.4890 Epoch 21/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0717 - acc: 0.9890 - val_loss: 1.7058 - val_acc: 0.4779 Epoch 22/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0809 - acc: 0.9807 - val_loss: 1.6888 - val_acc: 0.5074 Epoch 23/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0632 - acc: 0.9844 - val_loss: 1.6126 - val_acc: 0.5221 Epoch 24/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0607 - acc: 0.9899 - val_loss: 1.6519 - val_acc: 0.5110 Epoch 25/50 1088/1088 [==============================] - 14s 12ms/sample - loss: 0.0534 - acc: 0.9899 - val_loss: 1.6765 - val_acc: 0.5221 Epoch 26/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0482 - acc: 0.9890 - val_loss: 1.7059 - val_acc: 0.5147 Epoch 27/50 1088/1088 [==============================] - 14s 12ms/sample - loss: 0.0421 - acc: 0.9908 - val_loss: 1.8492 - val_acc: 0.4963 Epoch 28/50 1088/1088 [==============================] - 14s 13ms/sample - loss: 0.0433 - acc: 0.9926 - val_loss: 1.8144 - val_acc: 0.4963 Epoch 29/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0410 - acc: 0.9936 - val_loss: 1.7041 - val_acc: 0.5368 Epoch 30/50 1088/1088 [==============================] - 14s 13ms/sample - loss: 0.0311 - acc: 0.9972 - val_loss: 1.7463 - val_acc: 0.4926 Epoch 31/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0273 - acc: 0.9963 - val_loss: 1.6968 - val_acc: 0.5404 Epoch 32/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0242 - acc: 0.9982 - val_loss: 1.7035 - val_acc: 0.5294 Epoch 33/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0227 - acc: 0.9982 - val_loss: 1.7892 - val_acc: 0.5147 Epoch 34/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0228 - acc: 0.9954 - val_loss: 1.8214 - val_acc: 0.5331 Epoch 35/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0194 - acc: 0.9954 - val_loss: 1.8088 - val_acc: 0.5404 Epoch 36/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0180 - acc: 0.9991 - val_loss: 1.8417 - val_acc: 0.5221 Epoch 37/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0195 - acc: 0.9972 - val_loss: 1.7893 - val_acc: 0.5294 Epoch 38/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0124 - acc: 1.0000 - val_loss: 1.7678 - val_acc: 0.5478 Epoch 39/50 1088/1088 [==============================] - 14s 12ms/sample - loss: 0.0125 - acc: 0.9982 - val_loss: 1.8180 - val_acc: 0.5478 Epoch 40/50 1088/1088 [==============================] - 14s 12ms/sample - loss: 0.0123 - acc: 0.9982 - val_loss: 1.9202 - val_acc: 0.5368 Epoch 41/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0123 - acc: 1.0000 - val_loss: 1.9526 - val_acc: 0.5368 Epoch 42/50 1088/1088 [==============================] - 14s 13ms/sample - loss: 0.0110 - acc: 0.9991 - val_loss: 1.9613 - val_acc: 0.5037 Epoch 43/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0128 - acc: 0.9972 - val_loss: 1.8658 - val_acc: 0.5625 Epoch 44/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0149 - acc: 0.9972 - val_loss: 1.8586 - val_acc: 0.5625 Epoch 45/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0125 - acc: 0.9991 - val_loss: 1.8720 - val_acc: 0.5404 Epoch 46/50 1088/1088 [==============================] - 15s 14ms/sample - loss: 0.0167 - acc: 0.9972 - val_loss: 1.8838 - val_acc: 0.5331 Epoch 47/50 1088/1088 [==============================] - 15s 13ms/sample - loss: 0.0119 - acc: 0.9991 - val_loss: 2.0533 - val_acc: 0.5110 Epoch 48/50 1088/1088 [==============================] - 14s 13ms/sample - loss: 0.0143 - acc: 0.9991 - val_loss: 2.0298 - val_acc: 0.5257 Epoch 49/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0158 - acc: 0.9963 - val_loss: 1.9549 - val_acc: 0.5257 Epoch 50/50 1088/1088 [==============================] - 13s 12ms/sample - loss: 0.0168 - acc: 0.9963 - val_loss: 1.9458 - val_acc: 0.5074
def plot_accuracy_loss(history):
accuracy = history.history['acc']
val_accuracy = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(accuracy)) # Get number of epochs
plt.plot ( epochs, accuracy, label = 'training accuracy' )
plt.plot ( epochs, val_accuracy, label = 'validation accuracy' )
plt.title ('Training and validation accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(loc = 'lower right')
plt.figure()
plt.plot ( epochs, loss, label = 'training loss' )
plt.plot ( epochs, val_loss, label = 'validation loss' )
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(loc = 'upper right')
plt.title ('Training and validation loss')
# Final evaluation of the model
def evaluate_model(model,X_test,y_test):
scores = model.evaluate(X_test, y_test, verbose=0)
print("Loss:", scores[0])
print("Accuracy:", scores[1])
evaluate_model(ann_model,X_test,y_test_one_hot)
Loss: 1.9457851578207577 Accuracy: 0.50735295
plot_accuracy_loss(history)
ann_train_acc=history.history['acc'][-1]
ann_test_acc=history.history['val_acc'][-1]
ann_train_loss=history.history['loss'][-1]
ann_test_loss=history.history['val_loss'][-1]
print('Training accuracy',ann_train_acc,'ann_test_acc',ann_test_acc,'train_loss',ann_train_loss,'test_loss',ann_test_loss)
Training accuracy 0.9963235259056091 ann_test_acc 0.5073529481887817 train_loss 0.016763041751212713 test_loss 1.9457850947099573
model_performance_flower_cl=model_performance_flower_cl.append({'Model':'Neural Network',
'Accuracy': ann_train_acc,
'Test Accuracy':ann_test_acc ,
'Loss': ann_train_loss,
'Test Loss':ann_test_loss
}, ignore_index=True)
model_performance_flower_cl
| Model | Accuracy | Test Accuracy | Loss | Test Loss | |
|---|---|---|---|---|---|
| 0 | Logistic Regression | 1.000000 | 0.511029 | NA | NA |
| 1 | KNeighborsClassifier | 0.554228 | 0.320000 | NA | NA |
| 2 | Support Vector Classifier | 1.000000 | 0.470588 | NA | NA |
| 3 | Neural Network | 0.996324 | 0.507353 | 0.016763 | 1.945785 |
D. Train a model using a basic CNN and share performance metrics on test data. [4 Marks]
def build_cnn_model(height, width, num_channels, num_classes, loss='categorical_crossentropy', metrics=['accuracy']):
model = Sequential()
model.add(Conv2D(32, (5,5), activation ='relu', input_shape = (height, width, num_channels)))
model.add(MaxPooling2D(pool_size=3))
#model.add(Dropout(0.2))
model.add(Conv2D(filters=64, kernel_size=4, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))
model.add(Conv2D(filters=128, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Conv2D(filters=128, kernel_size=2, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))
model.add(Flatten())
# fully connected layer
model.add(Dense(units = 500, activation = 'relu'))
model.add(Dropout(0.2))
# output layer
model.add(Dense(units = num_classes, activation = 'softmax'))
model.summary()
return model
cnn_model = build_cnn_model(height, width, num_channels,output_classes)
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 220, 220, 32) 2432
max_pooling2d (MaxPooling2D (None, 73, 73, 32) 0
)
conv2d_1 (Conv2D) (None, 73, 73, 64) 32832
max_pooling2d_1 (MaxPooling (None, 36, 36, 64) 0
2D)
conv2d_2 (Conv2D) (None, 36, 36, 128) 73856
max_pooling2d_2 (MaxPooling (None, 18, 18, 128) 0
2D)
dropout_6 (Dropout) (None, 18, 18, 128) 0
conv2d_3 (Conv2D) (None, 18, 18, 128) 65664
max_pooling2d_3 (MaxPooling (None, 9, 9, 128) 0
2D)
dropout_7 (Dropout) (None, 9, 9, 128) 0
flatten_2 (Flatten) (None, 10368) 0
dense_8 (Dense) (None, 500) 5184500
dropout_8 (Dropout) (None, 500) 0
dense_9 (Dense) (None, 17) 8517
=================================================================
Total params: 5,367,801
Trainable params: 5,367,801
Non-trainable params: 0
_________________________________________________________________
## compile the model
cnn_model.compile(loss="categorical_crossentropy", metrics=["accuracy"], optimizer="adam")
def fit_model(model,X_train,y_train,X_val,y_val,epochs,batch_size):
history = model.fit(X_train, y_train,epochs=epochs,batch_size=batch_size,validation_data=(X_val, y_val))
return history
epochs=50
batchsize=100
cnn_history=fit_model(cnn_model,X_train,y_train_one_hot,X_test, y_test_one_hot,epochs,batchsize)
Train on 1088 samples, validate on 272 samples Epoch 1/50 1088/1088 [==============================] - 47s 43ms/sample - loss: 2.7453 - acc: 0.1278 - val_loss: 2.4896 - val_acc: 0.1397 Epoch 2/50 1088/1088 [==============================] - 51s 47ms/sample - loss: 2.1492 - acc: 0.2877 - val_loss: 1.9017 - val_acc: 0.3824 Epoch 3/50 1088/1088 [==============================] - 48s 44ms/sample - loss: 1.7075 - acc: 0.4136 - val_loss: 1.6358 - val_acc: 0.4669 Epoch 4/50 1088/1088 [==============================] - 48s 45ms/sample - loss: 1.4866 - acc: 0.4871 - val_loss: 1.4533 - val_acc: 0.5037 Epoch 5/50 1088/1088 [==============================] - 44s 41ms/sample - loss: 1.2312 - acc: 0.5809 - val_loss: 1.3442 - val_acc: 0.5404 Epoch 6/50 1088/1088 [==============================] - 43s 40ms/sample - loss: 1.1609 - acc: 0.6121 - val_loss: 1.2804 - val_acc: 0.5735 Epoch 7/50 1088/1088 [==============================] - 43s 40ms/sample - loss: 1.0144 - acc: 0.6581 - val_loss: 1.2221 - val_acc: 0.5625 Epoch 8/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.9043 - acc: 0.6994 - val_loss: 1.2370 - val_acc: 0.5772 Epoch 9/50 1088/1088 [==============================] - 41s 38ms/sample - loss: 0.7869 - acc: 0.7289 - val_loss: 1.1360 - val_acc: 0.6140 Epoch 10/50 1088/1088 [==============================] - 43s 40ms/sample - loss: 0.6142 - acc: 0.7950 - val_loss: 1.1991 - val_acc: 0.6287 Epoch 11/50 1088/1088 [==============================] - 42s 39ms/sample - loss: 0.5708 - acc: 0.7950 - val_loss: 1.2506 - val_acc: 0.5956 Epoch 12/50 1088/1088 [==============================] - 43s 39ms/sample - loss: 0.4929 - acc: 0.8281 - val_loss: 1.0962 - val_acc: 0.6434 Epoch 13/50 1088/1088 [==============================] - 43s 39ms/sample - loss: 0.3296 - acc: 0.8879 - val_loss: 1.1769 - val_acc: 0.6654 Epoch 14/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.2636 - acc: 0.9108 - val_loss: 1.2279 - val_acc: 0.6507 Epoch 15/50 1088/1088 [==============================] - 42s 39ms/sample - loss: 0.1961 - acc: 0.9375 - val_loss: 1.4979 - val_acc: 0.5919 Epoch 16/50 1088/1088 [==============================] - 43s 40ms/sample - loss: 0.2552 - acc: 0.9044 - val_loss: 1.3785 - val_acc: 0.6360 Epoch 17/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.2189 - acc: 0.9311 - val_loss: 1.4998 - val_acc: 0.6471 Epoch 18/50 1088/1088 [==============================] - 42s 39ms/sample - loss: 0.1727 - acc: 0.9494 - val_loss: 1.3833 - val_acc: 0.6324 Epoch 19/50 1088/1088 [==============================] - 43s 40ms/sample - loss: 0.1135 - acc: 0.9623 - val_loss: 1.4309 - val_acc: 0.6507 Epoch 20/50 1088/1088 [==============================] - 43s 40ms/sample - loss: 0.0716 - acc: 0.9807 - val_loss: 1.6263 - val_acc: 0.6434 Epoch 21/50 1088/1088 [==============================] - 43s 40ms/sample - loss: 0.0703 - acc: 0.9816 - val_loss: 1.6330 - val_acc: 0.6507 Epoch 22/50 1088/1088 [==============================] - 42s 39ms/sample - loss: 0.0640 - acc: 0.9807 - val_loss: 1.7269 - val_acc: 0.6618 Epoch 23/50 1088/1088 [==============================] - 42s 39ms/sample - loss: 0.0686 - acc: 0.9807 - val_loss: 2.0289 - val_acc: 0.6176 Epoch 24/50 1088/1088 [==============================] - 43s 39ms/sample - loss: 0.0634 - acc: 0.9816 - val_loss: 1.6785 - val_acc: 0.6581 Epoch 25/50 1088/1088 [==============================] - 46s 42ms/sample - loss: 0.0897 - acc: 0.9706 - val_loss: 1.6395 - val_acc: 0.6544 Epoch 26/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.0738 - acc: 0.9752 - val_loss: 1.7090 - val_acc: 0.6029 Epoch 27/50 1088/1088 [==============================] - 42s 39ms/sample - loss: 0.0433 - acc: 0.9871 - val_loss: 1.7007 - val_acc: 0.6360 Epoch 28/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.0350 - acc: 0.9899 - val_loss: 1.6808 - val_acc: 0.6507 Epoch 29/50 1088/1088 [==============================] - 41s 38ms/sample - loss: 0.0523 - acc: 0.9871 - val_loss: 1.7882 - val_acc: 0.6434 Epoch 30/50 1088/1088 [==============================] - 43s 39ms/sample - loss: 0.0303 - acc: 0.9871 - val_loss: 1.8703 - val_acc: 0.6618 Epoch 31/50 1088/1088 [==============================] - 42s 39ms/sample - loss: 0.0321 - acc: 0.9926 - val_loss: 2.1013 - val_acc: 0.6471 Epoch 32/50 1088/1088 [==============================] - 43s 40ms/sample - loss: 0.0560 - acc: 0.9807 - val_loss: 2.0125 - val_acc: 0.6287 Epoch 33/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.0365 - acc: 0.9871 - val_loss: 1.8231 - val_acc: 0.6654 Epoch 34/50 1088/1088 [==============================] - 41s 38ms/sample - loss: 0.0504 - acc: 0.9835 - val_loss: 1.9942 - val_acc: 0.6176 Epoch 35/50 1088/1088 [==============================] - 41s 38ms/sample - loss: 0.0422 - acc: 0.9881 - val_loss: 2.0769 - val_acc: 0.6213 Epoch 36/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.1095 - acc: 0.9706 - val_loss: 1.7809 - val_acc: 0.6176 Epoch 37/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.0686 - acc: 0.9835 - val_loss: 1.6401 - val_acc: 0.6507 Epoch 38/50 1088/1088 [==============================] - 42s 39ms/sample - loss: 0.0480 - acc: 0.9835 - val_loss: 1.6857 - val_acc: 0.6618 Epoch 39/50 1088/1088 [==============================] - 41s 38ms/sample - loss: 0.0381 - acc: 0.9917 - val_loss: 1.7729 - val_acc: 0.6471 Epoch 40/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.0267 - acc: 0.9926 - val_loss: 1.9661 - val_acc: 0.6213 Epoch 41/50 1088/1088 [==============================] - 41s 38ms/sample - loss: 0.0395 - acc: 0.9862 - val_loss: 1.9649 - val_acc: 0.6324 Epoch 42/50 1088/1088 [==============================] - 41s 38ms/sample - loss: 0.0289 - acc: 0.9917 - val_loss: 1.8781 - val_acc: 0.6176 Epoch 43/50 1088/1088 [==============================] - 42s 38ms/sample - loss: 0.0225 - acc: 0.9926 - val_loss: 2.0079 - val_acc: 0.6176 Epoch 44/50 1088/1088 [==============================] - 43s 39ms/sample - loss: 0.0106 - acc: 0.9963 - val_loss: 2.0455 - val_acc: 0.6250 Epoch 45/50 1088/1088 [==============================] - 42s 39ms/sample - loss: 0.0140 - acc: 0.9963 - val_loss: 2.0194 - val_acc: 0.6471 Epoch 46/50 1088/1088 [==============================] - 41s 38ms/sample - loss: 0.0119 - acc: 0.9954 - val_loss: 2.1537 - val_acc: 0.6471 Epoch 47/50 1088/1088 [==============================] - 44s 41ms/sample - loss: 0.0128 - acc: 0.9963 - val_loss: 2.0323 - val_acc: 0.6581 Epoch 48/50 1088/1088 [==============================] - 41s 38ms/sample - loss: 0.0141 - acc: 0.9963 - val_loss: 2.1692 - val_acc: 0.6838 Epoch 49/50 1088/1088 [==============================] - 41s 37ms/sample - loss: 0.0394 - acc: 0.9890 - val_loss: 2.3965 - val_acc: 0.6507 Epoch 50/50 1088/1088 [==============================] - 43s 39ms/sample - loss: 0.0365 - acc: 0.9899 - val_loss: 2.3798 - val_acc: 0.6103
## Evaluate model
evaluate_model(cnn_model,X_test,y_test_one_hot)
Loss: 2.3798262231490193 Accuracy: 0.6102941
print('training acc.:',cnn_history.history['acc'][-1],'\n','test acc.:', (cnn_history.history['val_acc'])[-1])
training acc.: 0.9898896813392639 test acc.: 0.6102941036224365
cnn_train_acc=cnn_history.history['acc'][-1]
cnn_test_acc=cnn_history.history['val_acc'][-1]
cnn_train_loss=cnn_history.history['loss'][-1]
cnn_test_loss=cnn_history.history['val_loss'][-1]
print('Training accuracy',cnn_train_acc,'cnn_test_acc',cnn_test_acc,'train_loss',cnn_train_loss,'test_loss',cnn_test_loss)
Training accuracy 0.9898896813392639 cnn_test_acc 0.6102941036224365 train_loss 0.03645769410479047 test_loss 2.3798262301613304
plot_accuracy_loss(cnn_history)
from sklearn.metrics import confusion_matrix
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
num_classes=17
import itertools
plt.subplots(figsize=(22,7)) #set the size of the plot
# Predict the values from the validation dataset
y_pred = cnn_model.predict(X_test)
# Convert predictions classes to one hot vectors
y_pred_classes = np.argmax(y_pred,axis = 1)
# Convert validation observations to one hot vectors
y_true = np.argmax(y_test_one_hot,axis = 1)
# compute the confusion matrix
confusion_mtx = confusion_matrix(y_true, y_pred_classes)
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(17))
# confusion matrix
# display the confusion matrix
print ("[INFO] confusion matrix")
# get the list of training lables
#labels = unique
# plot the confusion matrix
cm = confusion_matrix(y_true, y_pred_classes)
sns.heatmap(cm,
annot=True,
cmap="Set2")
plt.show()
[INFO] confusion matrix
model_performance_flower_cl=model_performance_flower_cl.append({'Model':'CNN',
'Accuracy': cnn_train_acc,
'Test Accuracy':cnn_test_acc,
'Loss': cnn_train_loss,
'Test Loss':cnn_test_loss
}, ignore_index=True)
model_performance_flower_cl
| Model | Accuracy | Test Accuracy | Loss | Test Loss | |
|---|---|---|---|---|---|
| 0 | Logistic Regression | 1.000000 | 0.511029 | NA | NA |
| 1 | KNeighborsClassifier | 0.554228 | 0.320000 | NA | NA |
| 2 | Support Vector Classifier | 1.000000 | 0.470588 | NA | NA |
| 3 | Neural Network | 0.996324 | 0.507353 | 0.016763 | 1.945785 |
| 4 | CNN | 0.989890 | 0.610294 | 0.036458 | 2.379826 |
E. Predict the class/label of image ‘Prediction.jpg’ using best performing model and share predicted label. [2 Marks]
def predict_one_image(img, model):
img_width=224
img_height=224
img = cv2.resize(img, (img_width, img_height), interpolation = cv2.INTER_CUBIC)
img = np.reshape(img, (1, img_width, img_height, 3))
img = img/255.
pred = model.predict(img)
print(pred)
class_num = np.argmax(pred)
print(class_num)
return class_num, np.max(pred)
import cv2
input_image = cv2.imread('prediction.jpg')
input_image_height, input_image_width, input_image_channels = input_image.shape
# Import required libraries
from matplotlib import pyplot as plt
from matplotlib import image as mpimg
plt.title('Showing prediction image') # Adds title
plt.xlabel('X pixel scaling') # Adds X label
plt.ylabel('Y pixels scaling') # Adds y label
im = mpimg.imread('prediction.jpg') # Reads the image
plt.imshow(im)
<matplotlib.image.AxesImage at 0x1bdc7217dc0>
cnn_predict_class, cnn_pred_proba=predict_one_image(input_image,cnn_model)
[[2.5929097e-09 2.6053343e-08 9.5345688e-01 1.0034416e-17 3.3164508e-13 5.3144469e-15 1.3364016e-17 4.0392064e-02 2.2477822e-11 4.3167070e-09 1.3762076e-08 1.2708427e-09 6.1496366e-03 3.3337457e-17 5.3585349e-17 1.3868837e-06 1.0143770e-22]] 2
print('Predicted class with cnn model',cnn_predict_class)
Predicted class with cnn model 2
print('Probabiliyt of predicted class with Cnn', cnn_pred_proba)
Probabiliyt of predicted class with Cnn 0.9534569
plt.imshow(X[261])
<matplotlib.image.AxesImage at 0x1bdc1cd9a30>
y[261]
2
Our model can predict the flower correctly as label 2
model_performance_flower_cl
| Model | Accuracy | Test Accuracy | Loss | Test Loss | |
|---|---|---|---|---|---|
| 0 | Logistic Regression | 1.000000 | 0.511029 | NA | NA |
| 1 | KNeighborsClassifier | 0.554228 | 0.320000 | NA | NA |
| 2 | Support Vector Classifier | 1.000000 | 0.470588 | NA | NA |
| 3 | Neural Network | 0.996324 | 0.507353 | 0.016763 | 1.945785 |
| 4 | CNN | 0.989890 | 0.610294 | 0.036458 | 2.379826 |
In this project, we learned to use different approaches to train and classify the Oxflower dataset. Among various models, tried CNN performed better in Test Data. Maximum accuracy of 61.20% achieved. Where as train accuracy is 98.98%.
Advantages of Convolutional Neural network
The CNN model Able to predict the prediction class correctly, that is been verified by verifying the same image in the data set with its label.